KEP-3243: Update the design to mutate the label selector based on matchLabelKeys at api-server instead of the scheduler handling it#5033
Conversation
|
/cc @sanposhiho |
|
/cc @alculquicondor @wojtek-t (or @alculquicondor) do we need PRR review in this case? (I suppose Yes?) |
|
/retitle KEP-3243: Update the design to mutate the label selector based on matchLabelKeys at api-server instead of the scheduler handling it |
|
cc @dom4ha |
sanposhiho
left a comment
There was a problem hiding this comment.
Update the content based on the conclusion kubernetes/kubernetes#129480 (comment)
sanposhiho
left a comment
There was a problem hiding this comment.
You need to update other sections such as Test Plan and other PRR questions.
|
I appreciate your comment. |
71f1c8c to
c6e0a76
Compare
|
/assign @wojtek-t for a PRR part reviewing. Please assign another person if needed. |
Queued - although I will wait for SIG approval to happen first. |
|
@sanposhiho I have a question.
Are my understandings correct? For example, if the following Pod manifest is applied apiVersion: v1
kind: Pod
metadata:
name: sample
labels:
app: sample
...
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector: {}
matchLabelKeys:
- appkube-apiserver will usually merge key-value labels corresponding to topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchExpressions:
+ - key: app
+ operator: In
+ values:
+ - sample
matchLabelKeys:
- appIt is the same if user sets key-value labels corresponding to apiVersion: v1
kind: Pod
metadata:
name: sample
labels:
app: sample
...
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- sample
matchLabelKeys:
- appHowever, if key-value labels corresponding to topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchExpressions:
- key: foo
operator: In
values:
- bar
matchLabelKeys:
- app |
|
... and you know what, I found I forgot about the update path issue that I raised myself while I'm writing this comment. 😅 So, answering your questions,
Usually, no. Like my above strikethrough-ed comment argued, we can just assume kube-apiserver handles all matchLabelKeys that users added. And, the scheduler has to only worry about the cluster wide default constraints. |
f9e848a to
eea447d
Compare
|
@sanposhiho |
02b3f71 to
820dcf8
Compare
a38566f to
688ecc1
Compare
|
@sanposhiho |
| 2. Even if it happens, the topology spread on the pod is just ignored (since `labelSelector` no longer matches the pod itself) | ||
| and the pod could still be schedulable. |
There was a problem hiding this comment.
Is it true? Even if the labelSelector no longer matches the pod itself, the topology spreading is not ignored and could still make the pod unschedulable (ofc selfMatchNum will be 0). If it's what you mean, I'd rephrase it.
There was a problem hiding this comment.
Wouldn't simply the label change become ignored? Does the pod itself need to be covered by the selector? I think the bigger problem is that since we'd have both implementations (in api-server and scheduler), won't scheduler add new label ot the labelSelecor and make the pod completely unschedulable?
There was a problem hiding this comment.
Is it true? Even if the labelSelector no longer matches the pod itself, the topology spreading is not ignored and could still make the pod unschedulable (ofc selfMatchNum will be 0). If it's what you mean, I'd rephrase it.
Yeah, sorry it's my suggestion and you're right, "ignored" was misleading and we should rephrase; matchNum - minMatchNum could still be bigger than maxSkew and node(s) in some domains could be unschedulable, depending on the current number of matching pods there.
won't scheduler add new label ot the labelSelecor and make the pod completely unschedulable?
The scheduler adds, but the pod won't be completely unschedulable. Rather, the label addition in this case makes the pod easier to get scheduled because selfMatchNum would be 0.
There was a problem hiding this comment.
Thank you for your comments.
I've tried to rephrase this sentence.
| #### the update to labels specified at `matchLabelKeys` isn't supported | ||
|
|
||
| `MatchLabelKeys` is handled and merged into `LabelSelector` at _a pod's creation_. | ||
| It means this feature doesn't support the label's update even though users, of course, |
There was a problem hiding this comment.
| It means this feature doesn't support the label's update even though users, of course, | |
| It means this feature doesn't support the label's update even though a user | |
| could update the label that is specified at `matchLabelKeys` after a pod's creation. |
| `TopologySpreadConstraint` which represent a set of label keys only. The scheduler | ||
| will use those keys to look up label values from the incoming pod; and those key-value | ||
| labels are ANDed with `LabelSelector` to identify the group of existing pods over | ||
| `TopologySpreadConstraint` which represent a set of label keys only. |
There was a problem hiding this comment.
| `TopologySpreadConstraint` which represent a set of label keys only. | |
| `TopologySpreadConstraint` which represents a set of label keys only. |
| could update the label that is specified at `matchLabelKeys` after a pod's creation. | ||
| So, in such cases, the update of the label isn't reflected onto the merged `LabelSelector`, | ||
| even though users might expect it to be. | ||
| On the documentation, we'll declare it's not recommended using `matchLabelKeys` with labels that is frequently updated. |
There was a problem hiding this comment.
I'd rather say that updating labels used is not recommended at all (instead of updating frequently), as it would break the original placement of such pods.
| 2. Even if it happens, the topology spread on the pod is just ignored (since `labelSelector` no longer matches the pod itself) | ||
| and the pod could still be schedulable. |
There was a problem hiding this comment.
Wouldn't simply the label change become ignored? Does the pod itself need to be covered by the selector? I think the bigger problem is that since we'd have both implementations (in api-server and scheduler), won't scheduler add new label ot the labelSelecor and make the pod completely unschedulable?
| to `labelSelector` of `topologySpreadConstraints` to filter and group pods. | ||
| The pods belonging to the same group will be part of the spreading in | ||
| `PodTopologySpread`. | ||
| In addition, kube-scheduler handles `matchLabelKeys` if the cluster-level default constraints is configured with `matchLabelKeys`. |
There was a problem hiding this comment.
Isn't it only a proposal? kubernetes/kubernetes#129198 (comment)
| Also, we cannot implement this feature only within kube-apiserver because it'd make it | ||
| impossible to handle `MatchLabelKeys` within the cluster-level default constraints in the scheduler configuration. | ||
|
|
||
| So we decided to go with the current design that implements this feature within both |
There was a problem hiding this comment.
| So we decided to go with the current design that implements this feature within both | |
| So we decided to go with the design that implements this feature within both |
| But, it may confuse users because this behavior would be different from PodAffinity's `MatchLabelKeys`. | ||
|
|
||
| Also, we cannot implement this feature only within kube-apiserver because it'd make it | ||
| impossible to handle `MatchLabelKeys` within the cluster-level default constraints in the scheduler configuration. |
There was a problem hiding this comment.
| impossible to handle `MatchLabelKeys` within the cluster-level default constraints in the scheduler configuration. | |
| impossible to handle `MatchLabelKeys` within the cluster-level default constraints in the scheduler configuration in the future (see https://github.com/kubernetes/kubernetes/issues/129198). |
| labels are ANDed with `LabelSelector` to identify the group of existing pods over | ||
| `TopologySpreadConstraint` which represent a set of label keys only. | ||
| kube-apiserver will use those keys to look up label values from the incoming pod | ||
| and those key-value labels are ANDed with `LabelSelector` to identify the group of existing pods over |
There was a problem hiding this comment.
| and those key-value labels are ANDed with `LabelSelector` to identify the group of existing pods over | |
| and those key-value labels will be merged with existing `LabelSelector` to identify the group of existing pods over |
| will use those keys to look up label values from the incoming pod; and those key-value | ||
| labels are ANDed with `LabelSelector` to identify the group of existing pods over | ||
| `TopologySpreadConstraint` which represent a set of label keys only. | ||
| kube-apiserver will use those keys to look up label values from the incoming pod |
There was a problem hiding this comment.
| kube-apiserver will use those keys to look up label values from the incoming pod | |
| At a pod creation, kube-apiserver will use those keys to look up label values from the incoming pod |
| kube-apiserver will use those keys to look up label values from the incoming pod | ||
| and those key-value labels are ANDed with `LabelSelector` to identify the group of existing pods over | ||
| which the spreading skew will be calculated. | ||
| kube-scheduler will also handle it if the cluster-level default constraints have the one with `MatchLabelKeys`. |
There was a problem hiding this comment.
| kube-scheduler will also handle it if the cluster-level default constraints have the one with `MatchLabelKeys`. | |
| Note that in case `MatchLabelKeys`is supported in the cluster-level default constraints (see https://github.com/kubernetes/kubernetes/issues/129198), kube-scheduler will also handle it separately. |
|
lgtm from me. Let's wait for @dom4ha |
alculquicondor
left a comment
There was a problem hiding this comment.
/approve
Time is running out, so better approve now.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, mochizuki875, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/lgtm |
|
@wojtek-t We haven't modified PRR file and it's merged without your approval. |
|
No further concerns from my side. I have limited availability this week, so thanks @macsko and @alculquicondor for approvals |
labelSelector.